Based on report20191030, incorporated with Aaron’s suggestion, make some modification of Exhaustion, MLE & MAP.

The related note is on my notebook, under topic Review Meeting 10.30.

Exhaustion

Fix \(\sigma_S^2=1\), we can achieve all possible combination under this condition, cause we care about the ratio among three variances instead of the value.

And we can use the second lose function now: \(\frac{\sum(X11_{irregular}-SSM_{irregular})^2}{\sigma_I^2} + \frac{\sum(X11_{trend}-SSM_{trend})^2}{\sigma_T^2} + \frac{\sum(X11_{seasonal}-SSM_{seasonal})^2}{\sigma_S^2}\)

# define the loss function
Dif <- function(x11, ssm, data, sigma){
  
  x11_trend <- series(x11, 'd12')
  x11_seasonal <- series(x11, 'd10')
  x11_irregular <- series(x11, 'd13')
  
  ssm_trend <- coef(ssm, states = 'trend')
  ssm_seasonal <- -rowSums(coef(ssm, states='seasonal'))
  ssm_irregular <- data[-1] - ssm_trend[-1] - ssm_seasonal[-length(data)]
 
  D <-  sum((x11_irregular[-1]-ssm_irregular)^2)/sigma[1] + 
    sum((x11_trend-ssm_trend)^2)/sigma[2] + 
    sum((x11_seasonal[-1]-ssm_seasonal[-length(data)])^2)/sigma[3] 
    
  return(D)
}
# define the search function

exhaustion <- function(data){
  
  Difference <- c()
  index <- c()
  
  x11 <- seas(data, x11='')
   for (i in 1:10) {
     for (j in 1:10) {
         
           ssmm <- SSModel(data ~ SSMtrend(1, Q=list(j*0.2)) + 
                     SSMseasonal(12, sea.type = 'dummy', Q = 1),
                   H = i*0.2)
           ssm <- KFS(ssmm)
           
           sigma <- c(i*0.2, j*0.2, 1)
           
           dif <- Dif(x11, ssm, data, sigma)
           
           Difference <- c(Difference, dif)
           
           index <- rbind(index, sigma)
      }
   }
  
  df <- data.frame(variance=index, difference = Difference)
  return(df)
}
unemp_exhaustion <- exhaustion(unemp)
unemp_exhaustion[which.min(unemp_exhaustion$difference),]
##          variance.1 variance.2 variance.3 difference
## sigma.99          2          2          1    2292023

The above part is to illustrate that when fix \(\sigma_S^2=1\), values of the other variances are greater than 1. That is to say we prefer to devide more variability to irregular and trend. Therefore, we can let them begin at 1, as following:

# define the search function

exhaustion <- function(data){
  
  Difference <- c()
  index <- c()
  
  x11 <- seas(data, x11='')
   for (i in 1:20) {
     for (j in 1:20) {
         
           ssmm <- SSModel(data ~ SSMtrend(1, Q=list(j)) + 
                     SSMseasonal(12, sea.type = 'dummy', Q = 1),
                   H = i)
           ssm <- KFS(ssmm)
           
           sigma <- c(i, j, 1)
           
           dif <- Dif(x11, ssm, data, sigma)
           
           Difference <- c(Difference, dif)
           
           index <- rbind(index, sigma)
      }
   }
  
  df <- data.frame(variance=index, difference = Difference)
  return(df)
}
unemp_exhaustion <- exhaustion(unemp)
unemp_exhaustion[which.min(unemp_exhaustion$difference),]
##           variance.1 variance.2 variance.3 difference
## sigma.124          7          5          1    1606151

Thus the ‘best’ value(ratio) is c(7,5,1) when we set scope of variance is 1:20.(I tried scope 1:100 and the result is still c(7,5,1).)

Let’s see the loss around this combination:

##           variance.1 variance.2 variance.3 difference
## sigma.82           5          3          1    1677311
## sigma.83           5          4          1    1642134
## sigma.84           5          5          1    1669829
## sigma.85           5          6          1    1715920
## sigma.86           5          7          1    1766857
## sigma.102          6          3          1    1687292
## sigma.103          6          4          1    1614196
## sigma.104          6          5          1    1623141
## sigma.105          6          6          1    1657874
## sigma.106          6          7          1    1700970
## sigma.122          7          3          1    1727538
## sigma.123          7          4          1    1615880
## sigma.124          7          5          1    1606151
## sigma.125          7          6          1    1629849
## sigma.126          7          7          1    1665519
## sigma.142          8          3          1    1786646
## sigma.143          8          4          1    1635735
## sigma.144          8          5          1    1607265
## sigma.145          8          6          1    1620081
## sigma.146          8          7          1    1648562
## sigma.162          9          3          1    1858335
## sigma.163          9          4          1    1667483
## sigma.164          9          5          1    1620141
## sigma.165          9          6          1    1622138
## sigma.166          9          7          1    1643571

And the results from c(7,5,1) are:

As a comparison, let’s see the result if we let variance to be (1,1,1):

From some details in the picture, it’s generally true that the results from c(7,5,1) is better than those from c(1,1,1).

MLE

# define the log likelihood function
# because of seasonal components' characteristic
# I didn't compute the loglikelihood at first 11 points

loglikelihood <- function(data, trend, season, sigma){
  
  n <- length(data)
  a <- 0
  for (i in 12:n)  a <- a + (sum(season[(i-11):i]))^2
  l <- -(n-11)/2 * log(sigma[1]) - 
    (n-11)/2 * log(sigma[2]) - (n-11)/2 * log(sigma[3]) -
    sum((data[-c(1:11)]-trend[-c(1:11)]-season[-c(1:10,n)])^2)/(2*sigma[1]) - 
    sum((trend[-c(1:11)]-trend[-c(1:10,n)])^2)/(2*sigma[2]) - 
    a / (2*sigma[3])
  return(l)
  
}

Like the exhaustion part, we first check cases when variance of the other two is less than 1:

# define the log likelihood matrix
loglikelihood_matrix <- function(data){
  LL <- c()
  index <- c()
   for (i in 1:100) {
     for (j in 1:100) {
         
         ssmm <- SSModel(data ~ SSMtrend(1, Q=list(j*0.01)) + 
                   SSMseasonal(12, sea.type = 'dummy', Q = 1),
                 H = i*0.01)
         ssm <- KFS(ssmm)
         
         ssm_trend <- coef(ssm, states = 'trend')
         ssm_seasonal <- -rowSums(coef(ssm, states='seasonal'))
  #      ssm_seasadj <- data[-1] - ssm_seasonal[-length(data)] # length is shorter
         sigma <- c(i*0.01, j*0.01, 1)
         
         ll <- loglikelihood(data, ssm_trend, ssm_seasonal, sigma)
         LL <- c(LL, ll)
         index <- rbind(index, sigma)
      }
   }
  df <- data.frame(variance=index, loglikelihood=LL)
  return(df)
}
unemp_loglikelihoodmatrix <- loglikelihood_matrix(unemp)
unemp_loglikelihoodmatrix[which.max(unemp_loglikelihoodmatrix$loglikelihood),]
##            variance.1 variance.2 variance.3 loglikelihood
## sigma.9999          1          1          1      -4858499

Therefore, we may exclude the possibility that out mle is less than 1.

Let’s look at the case when variances are greater than 1:

# define the log likelihood matrix
loglikelihood_matrix <- function(data){
  LL <- c()
  index <- c()
   for (i in 1:100) {
     for (j in 1:100) {
         
         ssmm <- SSModel(data ~ SSMtrend(1, Q=list(j)) + 
                   SSMseasonal(12, sea.type = 'dummy', Q = 1),
                 H = i)
         ssm <- KFS(ssmm)
         
         ssm_trend <- coef(ssm, states = 'trend')
         ssm_seasonal <- -rowSums(coef(ssm, states='seasonal'))
  #      ssm_seasadj <- data[-1] - ssm_seasonal[-length(data)] # length is shorter
         sigma <- c(i, j, 1)
         
         ll <- loglikelihood(data, ssm_trend, ssm_seasonal, sigma)
         LL <- c(LL, ll)
         index <- rbind(index, sigma)
      }
   }
  df <- data.frame(variance=index, loglikelihood=LL)
  return(df)
}
unemp_loglikelihoodmatrix <- loglikelihood_matrix(unemp)
unemp_loglikelihoodmatrix[which.max(unemp_loglikelihoodmatrix$loglikelihood),]
##            variance.1 variance.2 variance.3 loglikelihood
## sigma.9999        100        100          1     -52285.49

After setting the scope at 1:25, 1:50 and 1:100, I find that the mle is always the largest value we could take for each scope and the ratio of max loglikelihood is approx 4:2:1. If we write down the expression of loglikelihood, we will found the variance is our denominator, so that is to say, our components’ difference(numerator) doesn’t have a huge change for different cases, and values of variances themselves play a more important role for loglikelihood.

Let’s have a look at these loglikelihoods:

MAP